Quantitative methods 2:
Data analysis

SKOC39: Introduction to research methods
and academic writing

nils.holmberg@isk.lu.se

Quantitative methods

    1. Experiments and
      Threats to Validity
    1. Survey Research,
      Questionnaire
    1. Quantitative
      Content Analysis

Course literature

Boyle and Schmierbach (2015)

Boyle and Schmierbach (2019)

Lectures and workshops

Data collection (nov 12)

    1. Concept Explication and Measurement
    1. Reliability and Validity
    1. Effective ­Measurement
    1. Sampling
    1. Content Analysis

Exam question 1

Data analysis (nov 26)

    1. Experiments and Threats to Validity
    1. Survey Research
    1. Descriptive Statistics
    1. Inferential Statistics
    1. Multivariate Statistics

Exam question 2

9. Experiments and Threats to Validity

  • Random Assignment (p. 225)
  • Between-Subjects Design (p. 227)
  • Within-Subjects Design (p. 228)
  • Treatment Groups (p. 233)
  • Stimulus (p. 233)
  • Control Group (p. 238)

Random Assignment (p. 225)

9. Experiments and Threats to Validity

Between-Subjects Design (p. 227)

9. Experiments and Threats to Validity

Within-Subjects Design (p. 228)

9. Experiments and Threats to Validity

Treatment Groups (p. 233)

9. Experiments and Threats to Validity

Stimulus (p. 233)

9. Experiments and Threats to Validity

  • Hawthorne effect: Changes in participants’ behavior occur simply because they are aware they are being observed.
  • History effect: External events during the experiment influence participants’ responses or outcomes.
  • Interparticipant bias: Participants influence each other in ways that affect the results, compromising validity.
  • Maturation: Natural changes in participants over time (e.g., aging, learning) affect the dependent variable independently of the experiment.
  • Mortality: Loss of participants from a study, potentially leading to biased results if dropouts are non-random.
  • Regression toward the mean: Extreme initial measurements naturally tend to move closer to the average on subsequent measurements.
  • Ceiling and floor effects: When measurements are constrained by an upper (ceiling) or lower (floor) limit, reducing the ability to detect changes or differences.

Control Group (p. 238)

9. Experiments and Threats to Validity

10. Survey Research

  • Polls (p. 262)
  • Cross-Sectional Design (p. 264)
  • Internet Survey (p. 273)

Polls (p. 262)

10. Survey Research

  • Definition: Polls are structured sets of questions designed to gather opinions or attitudes from a group of individuals.
  • Explanation: Often used in political, social, or market research to measure public opinion or trends.
  • Key Features:
    • Representative sampling is critical for accuracy.
    • Questions must be clear and unbiased to avoid leading responses.
  • Examples:
    • Pre-election polls predicting voter preferences.
    • Customer satisfaction surveys after a product purchase.
    • Community sentiment on policy changes (e.g., local development projects).

Cross-Sectional Design (p. 264)

10. Survey Research

  • Definition: A research design that collects data from participants at a single point in time to explore relationships or differences between variables.
  • Explanation:
    • Useful for identifying associations, not causation.
    • Efficient and cost-effective but may miss changes over time.
  • Key Features:
    • Snapshot approach provides immediate insights.
    • Vulnerable to confounding variables that cannot be tracked over time.
  • Examples:
    • A survey measuring dietary habits and BMI in different age groups.
    • A study examining smartphone usage patterns by income level.
    • Public opinion surveys on climate change conducted within a single week.

Internet Survey (p. 273)

10. Survey Research

  • Definition: Surveys distributed and completed online, typically via email, websites, or social media platforms.
  • Explanation:
    • Allows for rapid data collection across large, diverse populations.
    • May introduce biases such as non-representative sampling or self-selection.
  • Key Features:
    • Cost-effective and time-efficient.
    • Flexible design options, including multimedia integration.
  • Examples:
    • Employee satisfaction surveys distributed through internal company portals.
    • Social media polls gauging interest in new products.
    • Academic research surveys hosted on platforms like Qualtrics or Google Forms.

14. Descriptive Statistics

  • Mean, Central Tendency (p. 361)
  • Median, Central Tendency (p. 362)
  • Mode, Central Tendency (p. 364)
  • Range, Dispersion (p. 366)
  • Variance, Dispersion (p. 366)
  • Standard Deviation, Dispersion (p. 368)
  • Histogram, Distribution (p. 372)
  • Normal Distribution (p. 373)
  • Outliers, Distribution (p. 376)

Mean, Central Tendency (p. 361)

14. Descriptive Statistics

Median, Central Tendency (p. 362)

14. Descriptive Statistics

Mode, Central Tendency (p. 364)

14. Descriptive Statistics

Range, Dispersion (p. 366)

14. Descriptive Statistics

Variance, Dispersion (p. 366)

14. Descriptive Statistics

Standard Deviation, Dispersion (p. 368)

14. Descriptive Statistics

Histogram, Distribution (p. 372)

14. Descriptive Statistics

Normal Distribution (p. 373)

14. Descriptive Statistics

Outliers, Distribution (p. 376)

14. Descriptive Statistics

15. Principles of Inferential Statistics

  • Null Hypothesis (p. 392)
  • Statistical Significance (p. 395)
  • Type I Error (p. 397)
  • Type II Error (p. 400)
  • Variance Explained (p. 404)
  • Bivariate Analysis (p. 405)
  • Chi-Square (p. 406)
  • One-Way ANOVA (p. 411)
  • Correlation (Pearson’s r) (p. 413)
  • Linearity (p. 415)

Sample Data, Job Performance

Job_Performance Job_Satisfaction Job_Satisfaction_Binary Organizational_Support_Binary
0 9.628835 5.993428 High Low
1 10.569730 4.723471 Low Low
2 13.719371 6.295377 High Low
3 14.809352 8.046060 High Low
4 8.885464 4.531693 Low Low
5 10.519744 4.531726 Low High
6 24.005538 8.158426 High High
7 15.031775 6.534869 High High
8 10.814470 4.061051 Low High
9 16.893897 6.085120 High Low

Null Hypothesis (p. 392)

15. Principles of Inferential Statistics

  • Assumes no effect or relationship exists (e.g., Job Satisfaction has no effect on Job Performance).
  • Provides a baseline for statistical testing.
  • Example: Test if Job Satisfaction Binary impacts Job Performance.
  • Rejecting the null suggests the effect is likely real.

Statistical Significance (p. 395)

15. Principles of Inferential Statistics

  • Indicates observed results are unlikely due to chance (e.g., p < 0.05).
  • Determines whether effects are statistically reliable.
  • Example: Chi-square test for Job Satisfaction Binary and Job Performance.
  • Significance ≠ practical importance.

Type I Error (p. 397)

15. Principles of Inferential Statistics

  • Occurs when rejecting a true null hypothesis (false positive).
  • Example: Concluding Organizational Support Binary affects Job Performance when it does not.
  • Controlled by setting a significance level (e.g., α = 0.05).
  • Use robust methods to minimize Type I errors.

Type II Error (p. 400)

15. Principles of Inferential Statistics

  • Failing to reject a false null hypothesis (false negative).
  • Example: Missing a real effect of Job Satisfaction Binary on Job Performance.
  • Reduced by increasing sample size or statistical power.
  • Balance Type I and II error risks in study design.

Variance Explained (p. 404)

15. Principles of Inferential Statistics

  • Proportion of variability in the dependent variable explained by the independent variable(s).
  • Example: Job Satisfaction explains 45% of variance in Job Performance.
  • R² indicates the strength of the relationship.
  • Higher variance explained = better model fit.

Bivariate Analysis (p. 405)

15. Principles of Inferential Statistics

  • Examines the relationship between two variables (e.g., Job Satisfaction and Job Performance).
  • Methods: scatterplots, Pearson’s r, and simple regression.
  • Example: Positive correlation between Job Satisfaction and Job Performance.
  • Helps assess significant associations before multivariate analysis.

Chi-Square (p. 406)

15. Principles of Inferential Statistics

  • Tests independence between categorical variables.
  • Example: Relationship between Job Satisfaction Binary and Organizational Support Binary.
  • Compares observed vs. expected frequencies.
  • Suitable for contingency tables with large enough samples.

One-Way ANOVA (p. 411)

15. Principles of Inferential Statistics

  • Tests if means of a continuous variable differ across groups of a categorical variable.
  • Example: Does Job Performance differ by Job Satisfaction Binary?
  • Outputs F-statistic and p-value to assess group differences.
  • Use post-hoc tests for specific group comparisons.

Correlation (Pearson’s r) (p. 413)

15. Principles of Inferential Statistics

  • Measures the linear relationship between two continuous variables.
  • Example: Correlation between Job Satisfaction and Job Performance.
  • r near 1/-1 = strong positive/negative correlation.
  • Assumes linearity and no significant outliers.

Linearity (p. 415)

15. Principles of Inferential Statistics

  • Assumes a straight-line relationship between variables.
  • Example: Job Satisfaction and Job Performance should appear linear.
  • Non-linearity can bias results; check scatterplots.
  • Use transformations or alternative models if needed.

16. Multivariate Inferential Statistics

  • Statistical Control (p. 428)
  • Spuriousness (p. 429)
  • Interaction (p. 431)
  • ANCOVA (p. 435)
  • MANOVA (p. 436)
  • Unstandardized Coefficients (p. 442)
  • Standardized Coefficients (p. 443)
  • R2 (p. 444)
  • Logistic Regression (p. 445)
  • Odds Ratio (p. 446)
  • Statistical Test (p. 447)
  • Statistical Analysis (p. 450)

Statistical Control (p. 428)

16. Multivariate Inferential Statistics

  • Control: Isolates the effect of an independent variable by holding other variables constant.
  • Example: Controlling for Organizational Support Binary when analyzing Job Satisfaction’s effect on Job Performance.
  • Spuriousness: Apparent relationships between variables that arise from a common third variable.
  • Example: If both Job Satisfaction and Job Performance correlate with Organizational Support, the observed relationship might be spurious.

Spuriousness (p. 429)

16. Multivariate Inferential Statistics

Interaction (p. 431)

16. Multivariate Inferential Statistics

  • Definition: Occurs when the effect of one independent variable on the dependent variable changes depending on another variable.
  • Example: The impact of Job Satisfaction on Job Performance may differ by Organizational Support Binary.
  • Purpose: Reveals combined effects of predictors, improving model depth.
  • Visualization: Interaction plots can show differing slopes for groups.

ANCOVA (p. 435)

16. Multivariate Inferential Statistics

  • Definition: Combines ANOVA and regression to control for covariates while testing group differences.
  • Example: Examining Job Performance differences by Job Satisfaction Binary, controlling for Organizational Support.
  • Strength: Reduces variability from confounders, increasing test sensitivity.
  • Output: Provides adjusted means and F-statistic for group effects.

MANOVA (p. 436)

16. Multivariate Inferential Statistics

  • Definition: Extends ANOVA to analyze multiple dependent variables simultaneously.
  • Example: Assessing how Job Satisfaction Binary affects Job Performance and Organizational Support together.
  • Purpose: Detects patterns across variables, reducing Type I error risk.
  • Output: Reports Wilks’ Lambda for overall model significance.

Unstandardized Coefficients (p. 442)

16. Multivariate Inferential Statistics

  • Definition: Reflect the change in the dependent variable for a one-unit change in an independent variable.
  • Example: A coefficient of 2.3 indicates a 2.3-unit increase in Job Performance per unit of Job Satisfaction.
  • Usage: Useful for practical interpretation in original measurement units.
  • Limitation: Cannot compare effects across variables with different scales.

Standardized Coefficients (p. 443)

16. Multivariate Inferential Statistics

  • Definition: Reflect the relative strength of predictors on the dependent variable by standardizing scales.
  • Example: A standardized coefficient of 0.7 suggests a stronger effect of Job Satisfaction than a coefficient of 0.3 for Organizational Support.
  • Purpose: Enables direct comparison of predictor importance.
  • Output: Expressed as beta coefficients in regression.

R2 (p. 444)

16. Multivariate Inferential Statistics

  • Definition: Proportion of variance in the dependent variable explained by the independent variables.
  • Example: R² = 0.65 means 65% of variability in Job Performance is explained by the model.
  • Purpose: Measures model fit and explanatory power.
  • Limitation: High R² does not confirm causality or practical importance.

Logistic Regression (p. 445)

16. Multivariate Inferential Statistics

  • Definition: Predicts binary outcomes using independent variables.
  • Example: Predicting High or Low Job Performance based on Job Satisfaction Binary.
  • Output: Reports coefficients as log odds for each predictor.
  • Purpose: Handles classification problems effectively.

Odds Ratio (p. 446)

16. Multivariate Inferential Statistics

  • Definition: Represents the likelihood of an outcome occurring in one group compared to another.
  • Example: Odds ratio > 1 indicates High Satisfaction increases the odds of High Performance.
  • Purpose: Quantifies the impact of predictors in logistic regression.
  • Usage: Useful in risk assessment and binary classification models.

Statistical Test (p. 447)

16. Multivariate Inferential Statistics

  • Definition: Assesses whether observed data differ significantly from expectations.
  • Example: A t-test evaluates whether Job Performance differs significantly between High and Low Satisfaction groups.
  • Output: Provides p-values to determine statistical significance.
  • Purpose: Validates hypotheses with inferential methods.

Statistical Analysis (p. 450)

16. Multivariate Inferential Statistics

  • Definition: Uses mathematical methods to summarize, explore, and infer from data.
  • Example: Regression analysis examines how Job Satisfaction predicts Job Performance.
  • Methods: Includes descriptive, inferential, and multivariate techniques.
  • Goal: Transforms raw data into actionable insights.

Next steps

Workshop 2, dec 2

References

Boyle, Michael, and Mike Schmierbach. 2015. Applied Communication Research Methods: Getting Started as a Researcher. Routledge.
———. 2019. Applied Communication Research Methods: Getting Started as a Researcher. Routledge.